Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Performance optimization strategy of distributed storage for industrial time series big data based on HBase
Li YANG, Jianting CHEN, Yang XIANG
Journal of Computer Applications    2023, 43 (3): 759-766.   DOI: 10.11772/j.issn.1001-9081.2022020211
Abstract388)   HTML15)    PDF (2121KB)(167)    PDF(mobile) (619KB)(12)    Save

In automated industrial scenarios, the amount of time series log data generated by a large number of industrial devices has exploded, and the demand for access to time series data in business scenarios has further increased. Although HBase, a distributed column family database, can store industrial time series big data, the existing strategies cannot meet the specific access requirements of industrial time series data well because the correlation between data and access behavior characteristics in specific business scenarios is not considered. In view of the above problem, based on the distributed storage system HBase, and using the correlation between data and access behavior characteristics in industrial scenarios, a distributed storage performance optimization strategy for massive industrial time series data was proposed. Aiming at the load tilt problem caused by characteristics of industrial time series data, a load balancing optimization strategy based on hot and cold data partition and access behavior classification was proposed. The data were classified into cold and hot ones by using a Logistic Regression (LR) model, and the hot data were distributed and stored in different nodes. In addition, in order to further reduce the cross-node communication overhead in storage cluster and improve the query efficiency of the high-dimensional index of industrial time series data, a strategy of putting the index and main data into a same Region was proposed. By designing the index RowKey field and splicing rules, the index was stored with its corresponding main data in the same Region. Experimental results on real industrial time series data show that the data load distribution tilt degree is reduced by 28.5% and the query efficiency is improved by 27.7% after introducing the optimization strategy, demonstrating the proposed strategy can mine access patterns for specific time series data effectively, distribute load reasonably, reduce data access overhead, and meet access requirements for specific time series big data.

Table and Figures | Reference | Related Articles | Metrics